An energy-aware and Q-learning-based area coverage for oil pipeline monitoring systems using sensors and Internet of Things

Pipelines are the safest tools for transporting oil and gas. However, the environmental effects and sabotage of hostile people cause corrosion and decay of pipelines, which bring financial and environmental damages. Today, new technologies such as the Internet of Things (IoT) and wireless sensor networks (WSNs) can provide solutions to monitor and timely detect corrosion of oil pipelines. Coverage is a fundamental challenge in pipeline monitoring systems to timely detect and resolve oil leakage and pipeline corrosion. To ensure appropriate coverage on pipeline monitoring systems, one solution is to design a scheduling mechanism for nodes to reduce energy consumption. In this paper, we propose a reinforcement learning-based area coverage technique called CoWSN to intelligently monitor oil and gas pipelines. In CoWSN, the sensing range of each sensor node is converted to a digital matrix to estimate the overlap of this node with other neighboring nodes. Then, a Q-learning-based scheduling mechanism is designed to determine the activity time of sensor nodes based on their overlapping, energy, and distance to the base station. Finally, CoWSN can predict the death time of sensor nodes and replace them at the right time. This work does not allow to be disrupted the data transmission process between sensor nodes and BS. CoWSN is simulated using NS2. Then, our scheme is compared with three area coverage schemes, including the scheme of Rahmani et al., CCM-RL, and CCA according to several parameters, including the average number of active sensor nodes, coverage rate, energy consumption, and network lifetime. The simulation results show that CoWSN has a better performance than other methods.


Scientific Reports
| (2022) 12:9638 | https://doi.org/10.1038/s41598-022-12181-w www.nature.com/scientificreports/ On the other hand, network lifetime is a very important issue when covering the network because this parameter specifies how long time the pipeline network can work properly to meet the coverage requirements. Therefore, the network lifetime is an important criterion for evaluating smart pipelines. A suitable solution for this issue is to design a scheduling mechanism for sensor nodes. In this mechanism, in each scheduling period, only part of sensor nodes are activated in the network, and other nodes are in sleep status to reduce their energy consumption. The active sensor nodes should guarantee the desired partial coverage rate. The scheduling issue in the network can be considered as an optimization issue. However, real-world optimization issues are very complicated because they are large and dynamic. As a result, it is necessary to use computational intelligencebased methods such as reinforcement learning (RL) to solve these issues 28,29 . Today, RL algorithms are being popular rapidly because they can successfully find optimal response at a proper time. RL is suitable for solving issues such as routing, data aggregation, and coverage in WSN and IoT. RL is an appropriate tool in computational intelligence. It can learn optimal policy through interaction with the environment 30,31 . In this paper, we use the RL algorithm to achieve proper coverage rate so that our scheme uses the minimum number of active sensor nodes and improves energy consumption in the network.
In this paper, we propose an appropriate area coverage method called CoWSN to intelligently monitor oil and gas pipelines, so that CoWSN balances energy consumption and increases coverage quality in the network. The main contributions of CoWSN are as follows: • In CoWSN, sensing range of each sensor node is converted to the digital matrix using a new, efficient, and distributed technique. The digital matrix helps us to calculate the overlap of this node with other neighboring nodes using geometric mathematics. • In CoWSN, a Q-Learning-based scheduling mechanism is presented to calculate the activity time of sensor nodes based on three parameters, including the overlap between a sensor node and neighboring nodes, energy, and distance to BS. • In CoWSN, the replacement time of nodes is predicted using an appropriate technique so that the data transmission process between sensor nodes and the base station is not disrupted.
In the following, the paper is organized as follows: in "Related works", the related works are expressed. Then, the basic concepts used in the proposed method are summarized in "Basic concepts". "System model" describes the system model in CoWSN. "Proposed schemeProposed scheme" explains our proposed method in detail. "Simulation and result evaluation" compares the simulation results of CoWSN with other coverage schemes. Finally, the conclusion of the paper is presented in "Conclusion".

Related works
In 32 , the CCM-RL technique is presented in WSNs to maintain connections and coverage using reinforcement learning. CCM-RL tries to maximize coverage rate and maintain connections along with energy efficiency. In this method, nodes execute a learning algorithm to learn their optimal activity. As a result, CCM-RL activates a subset of nodes at a specific time. This reduces energy consumption in the network and provides an appropriate coverage rate and suitable connectivity. CCM-RL is a dynamic, distributed, and scalable coverage method. However, the scheduling mechanism has only taken attention to two parameters, including distance and coverage rate, and has ignored energy parameter. Also, CCM-RL has a lot of delay.
In 5 , an area coverage approach based on fuzzy logic (FL) and shuffled frog-leaping algorithm (SFLA) is proposed. This approach balances energy consumption, increases network lifetime and improves coverage quality. This method calculates the overlap between sensor nodes using a distributed digital matrix-based approach. Then, a fuzzy scheduling mechanism is designed. This mechanism considers three parameters, including overlap, residual energy, and distance between each node and the base station to determine the activity time of each sensor node. Moreover, this method uses a strategy to predict the death of sensor nodes and prevent holes in network. Finally, this approach uses SFLA to find the best replacement strategy, which covers the holes created in the network and maximizes coverage rate. This coverage method is distributed, dynamic, and scalable. However, this method has a high communication overhead due to fuzzy scheduling mechanism. Also, the use of SFLA increases delay in the network.
In 33 , the CCA technique is offered in two forms, including distributed and centralized, for homogeneous WSNs. CCA solves the k-coverage issue when deploying sensor nodes in the network. For solving this issue, CCA tries to use the lowest number of sensor nodes and increase network lifetime. CCA designs a scheduling process. According to this process, a subset of nodes is selected for covering the desired area. Centralized CCA can be implemented dynamically and statically. The dynamic method has a greater time complexity in comparison with the static method. Also, the dynamic approach has a better coverage rate than the static scheme. In general, distributed CCA is more scalable than centralized CCA because the distributed scheme relies on local information. Of course, distributed CCA has a lot of communication overhead.
In 34 , a partial coverage technique based on learning automata (PCLA) is suggested in WSNs. PCLA solves the coverage issue and uses the minimum number of sensor nodes and maintains appropriate connectivity. This method uses learning automata (LA) to determine the activity time of nodes in the network. PCLA creates a backbone in the network. Therefore, a subset of the sensor nodes is selected for covering the RoI and maintaining connectivity. PCLA is a distributed, dynamic, and scalable method. However, this method has a lot of overhead.
In 35  www.nature.com/scientificreports/ VFA optimization. MIGA achieves a high-quality response for maximizing coverage rate. However, MIGA is a centralized coverage method. It has a lot of computational overhead and is not scalable. In 36 , the maximum coverage sets scheduling (MCSS) mechanism is presented in WSN. MCSS schedules the coverage sets and improves network lifetime. This method uses a greedy algorithm for searching the problem. MCSS has acceptable time complexity and computational complexity. MCSS assumes that coverage sets and time slots of nodes are predetermined. MCSS is a centralized method and is not scalable. It takes into account only the activity time of nodes and does not consider other parameters such as energy and distance.
In 37 , a barrier coverage method is offered in homogeneous wireless sensor networks. It uses the minimum number of sensor nodes when covering the network. In this method, the authors calculate the overlap between sensor nodes based on the angle between their sensing ranges. Furthermore, this method detects failed nodes and covers the holes created in the network. The most important advantage of this method is the proper coverage rate with the lowest sensor nodes. However, the authors have only taken attention to two parameters, including distance and overlap. Also, this method cannot predict the death time of nodes in the network. This scheme increases the latency in the network.
In 38 , two coverage schemes are proposed for heterogeneous wireless sensor networks. These schemes use the improved cuckoo search (ICS) algorithm and chaotic flower pollination algorithm (CFPA). These methods try to reduce the implementation cost and energy consumption in the network. This method presents a fitness function, which considers only one parameter, including the overlap between nodes. It can be improved by considering more parameters. ICS and CFPA have a small computational complexity. They are simple and fast (high convergence speed) and can achieve a high-quality response. Furthermore, these schemes are static. This means that they explore the best replacement for sensor nodes and this strategy is fixed over network life.
In 39 , two area coverage schemes are suggested in WSN. These approaches utilize the genetic algorithm (GA) and particle swarm optimization (PSO). The authors assume that the network consists of a number of obstacles. Then, they define the coverage issue in this network and use GA and PSO for addressing it. The important advantage of these approaches is to obtain the highest coverage rate with suitable computational overhead. However, this method considers only one parameter (i.e. the overlap between sensor nodes) when designing the cost function and does not consider other parameters such as energy. Also, this method focuses only on maximum area coverage and does not consider network lifetime. It is centralized and static. This reduces scalability.
In 40 , a mathematical model is proposed to solve the coverage issue in WSN. This method moves sensor nodes toward low-density network areas to maximize the coverage rate in the network. This method distributes the sensor nodes in the network evenly. It uses an improved version of the virtual force algorithm. This method has low computational complexity. However, it is static and centralized and is not scalable. In this method, the goal is to maximize area coverage and does not pay attention to network lifetime.

Basic concepts
In this section, we briefly describe a well-known reinforcement learning technique called Q-Learning because we use this technique in the proposed method for designing the scheduling mechanism. Q-Learning. Reinforcement learning (RL) allows machines or agents to learn their ideal behavior in a particular situation based on previous experience 28 . A RL-based model learns through interaction with the environment and collects information to do a specific activity. Q-Learning is a model-free and off-policy reinforcement algorithm. Q-Learning helps one agent to learn its optimal actions. According to this learning algorithm, stateaction pairs are stored in a table called Q-table. This table receives a state-action pair as input and returns the Q-value as output. In Q-Learning, the goal is to maximize the Q-value. To achieve this goal, the agent adjusts its action strategy according to the reward received from the environment's feedback. In the learning process, the agent evaluates how many an action is suitable in the current state to choose a better action in the next iteration. Q-value is updated in each iteration using Eq. (1): where t is current iteration, a ∈ A is the action set, r t+1 indicates the reward value received by the agent after doing the action a t in the state s t . When the agent performs the action a t , its state changes from s t to s t+1 . max a Q(s t+1 , a) indicates maximum Q value when the agent performs the action a in the next iteration. 0 < α ≤ 1 is the learning rate. If α = 0 , then the agent does not learn anything. If α = 1 , the agent learns only the last experience. In the proposed method, we consider α = 0.1 . Also, 0 < γ ≤ 1 indicates the discount factor. It represents the reward importance. In fact, it indicates the agent's effort for discovering the environment. We consider γ = 0.7 in the proposed method. Note that there are two techniques, including ǫ-greedy and Boltzmann in reinforcement learning to create a balance between exploration and exploitation 28 . In ǫ-greedy, the quantitative allocation strategy with a small value ǫ is used for exploring. In the proposed method, we used the ǫ-greedy technique to create a balance between exploration and exploitation.

System model
This section consists of four subsections: network model, energy model, sensing model, and communication model. In the following, we describe each subsection in detail.
Network model. In the proposed method, we consider a heterogeneous network with N sensor nodes. The nodes are heterogeneous. This means that they have different energy resources, sensing ranges, and communi- www.nature.com/scientificreports/ cation ranges. They are randomly distributed in the network environment. Sensor nodes are equipped with a positioning system. Therefore, they are aware of their spatial coordinates x i , y i in the network. Also, the location of the base station x BS , y BS is known for each node in the network. Each sensor node knows its remaining energy ( E residual ) at any moment. If sensor nodes are in communication ranges of each other, they can directly communicate with each other through a wireless communication channel. In this model, the network includes one base station, N Static static sensor nodes, and N Dynamic mobile sensor nodes. Where, And, In the following, we describe the task of each node: • Base station (BS): This node is responsible for receiving and processing information of sensor nodes.
• Static sensor nodes: These nodes are responsible for sensing the RoI and sending the sensed data to the base station. • Mobile sensor nodes: These nodes are responsible for covering holes caused by the death of sensor nodes in the network.
Energy model. In CoWSN, when a transmitter node such as SN i sends its data (k bits) to a receiver node like SN j and the distance between the two nodes is equal to d. SN i calculates the energy consumed for sending k bits according to Eq. (4): Also, SN j calculates the energy consumed for receiving k bits using Eq. (5): where E elec indicates the energy used by transmitter/receiver circuit, E fs and E mp are the energy required for the transmitter amplifier in the free space and multipath models, respectively. Equation (6) computes d 0 , which is the threshold of transmission distance: Sensing model. CoWSN uses the binary sensing model that is also called 0/1 model. In this model, each SN i with spatial coordinates x i , y i can sense the circular area. The radius of this area is equal to RS i . Consider one point in the RoI, for example P = x p , y p . In binary sensing model, SN i can sense P only when their Euclidean distance is lower than radius RS i . In this case, we state that this point is inside the sensing range of SN i . Otherwise, P is outside the sensing range of SN i and cannot be covered by this node. This issue is expressed in Eq. (7): And, the distance between SN i and P is shown by d(SN i , P) . This parameter is calculated by Eq. (8)

Proposed scheme
Our scheme (CoWSN) is an area coverage technique. It increases the coverage quality, balances energy consumption in the network, and improves network lifetime. CoWSN consists of three parts: • Converting sensing ranges of sensor nodes to digital matrix • Q-learning-based scheduling mechanism • Node replacement nique is introduced for calculating the overlap of sensor nodes with their neighbors. According to this technique, the sensing range of a node is converted into a digital matrix. Note that a matrix is known as a digital matrix if all elements are zero or one. In the following, we describe this process in detail. First, the sensing range of a node (for example, SN i ) is converted into a digital matrix. In this process, the node's coordinates is considered as the pole in the polar coordinate system and the sensing range of this node is displayed as a circular area C i with the sensing radius RS i . C i is divided into n sectors ( s ⌢ ec p , p = 1, 2, . . . , n ) and m smaller circles ( c q , q = 1, 2, . . . , m ) with the same center. So that, n = 2π �θ and m = RS i R . Also, �θ is the angle of each sêc p . Furthermore, each circle c q has a radius such as r q , this process is shown in Eqs. (10) and (11): Figure 2 shows an example in which C i is divided into 16 sectors and 8 smaller circles. According to Fig. 2, this process partitions C i into small rectangular sections. Note that �θ and R can be adjusted based on the problem requirements. When the user selects �θ and R close to zero, the result will be more accurate. However, this work increases memory consumption. Now, C i is converted to an m × n digital matrix. In this matrix, rows and columns are equal to smaller circles ( c q ) and sectors ( s ⌢ ec p , p = 1, 2, . . . , n ), respectively. Furthermore, matrix elements ( a qp ) represent rectangular sections, so that q = 1, . . . , m and p = 1, . . . , n . Note that a qp can be one or zero. In fact, a qp is equal to one when the corresponding rectangular section overlaps with the sensing range at least one neighboring node. Otherwise, a qp is equal to zero. Note that a qp will be zero, when the corresponding rectangular section is not fully covered by neighboring nodes. An example of this digital matrix is shown in Fig. 3.
In the following, we describe how to determine the value of a qp . In the first step, each sensor node like SN i shares its information, including its identifier ( ID i ), spatial coordinates x i , y i , remaining energy ( E residual i ) and sensing radius (RS i ) , with its own neighbors and broadcasts a Hello message for them. Then, the node stores the information of its neighbors in a table called Table neighbor , which is shown in Table 1. Now, SN i can obtain the Euclidean distance between itself and its neighbors such as SN j according to Eq. (12): Figure 2. Dividing the sensing range of a sensor node into small rectangular sections. So that x i , y i and x j , y j are coordinates SN i and SN j , respectively. If d ij ≤ RS j − RS i or RS i − RS j < d ij < RS i + RS j , then two nodes overlap in their sensing range. This means that if d ij ≤ RS j − RS i then all a qp where, q = 1, . . . , m and p = 1, . . . , n , will be one. Otherwise, if RS i − RS j < d ij < RS i + RS j , then the angle of C j with regard to the pole (i.e. SN i ) in the polar coordinate system is calculated using Eq. (13): See Fig. 4. For obtaining the value of a qp in the digital matrix and calculating the overlap between SN i and SN j , we follow the following commands.
• If d ij ≥ R j + r q , so that 1 ≤ q ≤ 8 in this example, c q and all smaller circles are outside C j . As a result, a qp will be zero in the corresponding rows.   www.nature.com/scientificreports/ • If d ij ≤ R j − r q , so that 1 ≤ q ≤ 8 in this example, c q and all smaller circles are inside C j . Therefore, a qp will be one in the corresponding rows. • If R j − r q < d ij < R j + r q , c q and C j partially overlap with each other, and this overlap is calculated as follows: -As shown in Fig. 5, a triangle with three vertices, x i , y i , x j , y j , and the intersection point of c q and C j is considered. Now, compute the length of three sides of this triangle. -Here, the angle θ 1 = θ 2 is obtained using the cosine law: where, -Equation (16) computes the overlapping area ( γ q ) between c q and C j : -Now, Eq. (17) computes a qp corresponding to the row c q and the sector sec p . This process is repeated for all c q , 1 ≤ q ≤ 8 to calculate all a qp , where, q = 1, . . . , m and p = 1, . . . , n . Algorithm 1 presents the pseudocode of this process. www.nature.com/scientificreports/ Q-learning-based scheduling mechanism. In this section, our goal is to design a Q-learning-based scheduling mechanism so that each sensor node learns independently and automatically the best ON/OFF time slots in any scheduling round ( T Scheduling ) to maximize coverage rate and network lifetime. The learning process immediately begins after deploying sensor nodes in the network. At the start time ( t = 0 ), we initialize the Q-learning parameters, including learning rates ( α ), discount factor ( γ ), and Q value. Also, all sensor nodes are activated at t = 0 . Then, the time slots are updated and modified in the learning process according to the Q-learning algorithm to achieve optimal response. As stated in "Converting sensing ranges of sensor nodes to digital matrix", at t = 0 , each sensor node shares its information such as its identifier ( ID i ), spatial coordinates x i , y i , remaining energy ( E residual i ), the scheduling state ( T Scheduling ), and sensing radius RS i with its own neighbors, and stores their information in Table neighbor shown in Table 2. In the learning process, this information is used.
In the following, we describe various components in this scheduling mechanism: Agent In this protocol, each sensor node ( SN i ) plays the agent role. Environment In this learning issue, the network plays the role of environment.  Fig. 6) is calculated based on Eq. (18): where Area Circle sector qp is the sector area p in the circle c q . It is calculated through Eq. (19): After merging Eqs. (19) and (18) where m × n is the size of DigitC i and RS i represents the sensing radius of SN i . Action In the scheduling issue, the action is a set of all activities that can be done by the agent SN i in the state O i . Assume that the scheduling round ( T Scheduling ) includes ten time slots ( Ts i , i = 1, . . . , 10 ). Each sensor node can be active in some of these time slots (i.e. Ts i = T ON ) or it can be sleep in other time slots ( Ts i = T OFF ) so that T Scheduling = 10 i=1 Ts i . To better understand this issue, consider the example presented in Table 2. The purpose of the learning algorithm is to find the best possible scheduling for each sensor node SN i . As a result, the action corresponding to SN i is as T t Scheduling = [Ts 1 , Ts 2 , . . . , Ts 10 ] t at the iteration t. www.nature.com/scientificreports/ Award The award indicates the environment's feedback with regard to the action performed by the agent SN i in the state O i . If this action ( T Scheduling ) is successful, the environment has positive feedback. Otherwise, it has negative feedback. In CoWSN, we consider two parameters for calculating the award function: the remaining energy ( E residual ) and the distance between each node and the BS ( D i−BS ). The reason for choosing these parameters is that high-energy nodes receive a positive award from the environment and increase their Q-value to stay in the ON mode for more time slots. Also, low-energy nodes receive a negative reward and decrease their Q-value to do their activities in fewer time slots. As a result, we choose the energy parameter to balance energy consumption in the network. Also, the purpose of choosing the distance parameter is that the sensor nodes close to the BS receive a positive reward from the environment and increase their Q-value to stay in the ON mode at more time slots because these nodes do more operations than other nodes. Therefore, the award function is calculated using Eq. (24): In Eq. (24), x i , y i and x BS , y BS are the spatial coordinates of SN i and BS, respectively. µ d indicates the average distance of active neighboring nodes to the BS obtained from Table neighbor of SN i . Furthermore, σ d is the distance standard deviation of active neighbors at the current iteration. Also, E residual indicates the remaining energy of SN i . µ e and σ e are the average energy of active neighbors and the energy standard deviation of active neighbors at the current iteration, respectively. ω is a weight coefficient.
In Eqs. (25), (26), (27) and (28), m is the number of active neighboring nodes of SN i at the current iteration. Convergence condition It is the time interval required by the learning algorithm to achieve the optimal response. In CoWSN, if the learning algorithm finds that the response does not change in the five last iterations, then the algorithm is convergent. Finally, the optimal response (the scheduling determined for SN i ) is stored.
In algorithm 2, the pseudo-code of the scheduling mechanism is presented. www.nature.com/scientificreports/ Node replacement. After launching the network, the sensor nodes begin their activities according to the time slots specified in the scheduling round. These activities lead to energy loss of sensor nodes. In this section, we predict the death time of nodes to prevent possible holes in the network. This work does not allow that the data transmission process between nodes and the BS is disrupted. To achieve this purpose, each sensor node (for example, SN i ) updates periodically its Priority i based on Eq. (29) to determine its importance for replacing.
is the non-overlapping area of SN i . Note that O i is the overlapping area of SN i obtained through Eq. (13). Also, RS i is the sensing radius of SN i .
Packet size i Buffer size i calculates the data traffic in SN i . Packet size i is the number of packets in the buffer of SN i at a specific time. Also, Buffer size i represents the buffer size of SN i .
When SN i loses its energy so that its energy is lower than a threshold. This node sends a warning message along with its Priority i to the BS. Then, the BS compares Priority i with P Threshold (a threshold value for priority) to decide on the replacement of this node. Note that P Threshold is a constant amount so that P Threshold > 0.
• If Priority i > P Threshold is larger than P Threshold , then SN i has a higher importance for replacing because if SN i dies in the network, then the normal network operations are damaged. Therefore, the BS sends a Coverage message to the mobile node closest to SN i for replacing this node. • If Priority i is smaller or equal to P Threshold , then the death of SN i cannot disrupt the normal network operations. Therefore, the BS ignores SN i .

Simulation and result evaluation
In this section, we simulate CoWSN with NS2 to evaluate its performance. The simulation results are compared with the scheme of Rahmani et al. 5 , CCM-RL 32 , and CCA 33 . We assume that the network includes 250-2000 heterogeneous sensor nodes, which are randomly distributed in the network. The nodes have different sensing ranges (i.e. 25, 30, and 35 m) and communication ranges (i.e. 50, 60, and 70 m). The network size is equal to 1000 × 1000 m 2 . When nodes are active, they consume energy equal to 57mA. Also, when nodes are inactive, they consume 0.40 µ A. Table 3 presents simulation parameters in summary. We evaluate the performance of CoWSN in terms of four parameters, including the average number of active sensor nodes, coverage rate, energy consumption, and network lifetime.
The average number of active sensor nodes. The number of active sensor nodes indicates the subsets of the active nodes selected for covering the Region of Interest (RoI). As shown in Fig. 7, CoWSN has the best performance in terms of the number of active nodes at a scheduling period. This means that CoWSN lowers the number of active nodes by 7.67%, 11.04%, and 13.32% compared to Rahmani et al., CCM-RL, and CCA, respectively. This is because CoWSN uses a Q-Learning-based scheduling mechanism to determine the activity time of the sensor nodes. CoWSN and Rahmani et al. calculate the overlap between a sensor node and its neighbors by a precise approach. Although, the scheme of Rahmani et al. has a fuzzy scheduling mechanism for calculating the activity time of nodes in the network. This method has a weaker performance than our method. Also, CCM-RL focuses only on the distance parameter when calculating the overlap of nodes. This is not a precise method and can have a lot of error. In addition, CCA does not present any approach to calculate the overlap between nodes. On the other hand, CoWSN considers two parameters, including energy and distance to the base station, to determine the best scheduling for sensor nodes. CCA focuses on the energy parameter in the scheduling process, but does not pay attention to the overlap between nodes. Furthermore, CCM-RL does not consider energy of nodes in the learning process. It is an important weakness of this method. According to Fig. 7, when the number of nodes is increasing in the network, all methods increase the average number of active nodes. Note that a coverage method cannot activate all nodes at all times because they consume high energy and die quickly, which reduces network lifetime. As a result, when the density of nodes is low in the network, the active nodes cannot cover the whole RoI. Thus, the coverage rate is reduced. But when the density of nodes is increasing, each method increases the number of active nodes. Therefore, the coverage quality of the RoI is improved. CCM-RL, and CCA, respectively. This is due to the fact that our method calculates overlapping between nodes accurately and penalizes the nodes with more overlapping. This means that they receive the lowest reward to be in the sleep mode for more slot times. This helps CoWSN to achieve the best coverage rate with the lowest active nodes in the network. Moreover, CoWSN can predict the death of sensor nodes and timely replace them to prevent coverage quality loss. Meanwhile, CCM-RL and CCA do not provide any approach to replace dead nodes. Although, the scheme of Rahmani et al. addresses this issue using SFLA. According to Fig. 8, when the number of active nodes is more than 350, CoWSN achieves a coverage rate more than 90%, which is very desirable. While the scheme of Rahmani et al. has achieved a coverage rate equal to 88% for this number of active nodes. This coverage rate is fixed and does not change when increasing the number of active nodes. In CCM-RL, the coverage rate is not constant and is improved when increasing the number of active nodes. Although, in the best mode, CCM-RL has reached a coverage rate equal to 86%. In CCA, the coverage rate is equivalent to 77% for 350 active nodes or more.
Energy consumption. As shown in Fig. 9, CoWSN reduces energy consumption by 27.27%, 51.51%, and 70.19% compared to Rahmani et al., CCM-RL, and CCA, respectively. This is because our Q-learning-based scheduling mechanism considers the energy parameter when designing the reward function. In our method, high-energy nodes receive a reward and increase their Q value to be in the active mode for more time. Moreover, low-energy nodes are penalized to do their activities in a short time. Also, the scheme of Rahmani et al. considers the energy parameter in the scheduling process of sensor nodes. However, this method has high communica-  www.nature.com/scientificreports/ tion overhead because it uses fuzzy logic in the scheduling mechanism. Furthermore, it uses SFLA to cover the hole created in the network. SFLA increases energy consumption in this method. CCM-RL has the third rank in terms of energy consumption compared to other methods. This scheme does not consider the energy parameter in the scheduling process. Although, it uses the sensing range customization mechanism, which helps CCM-RL to consume energy efficiently. CCA has the worst performance in terms of energy consumption because it has high communication overhead.
Network lifetime. Figure 10 compares various methods in terms of network lifetime. We assume that there are 200 alive nodes in the network when doing this experiment. The nodes consume their energy over time.
CoWSN improves the network lifetime by 9.55%, 32.94% and 36.32% in comparison with Rahmani et al., CCM-RL, and CCA, respectively. We described the reasons for this issue in "Energy consumption". CoWSN tries to evenly distribute energy consumption between sensor nodes in the network because it takes into account the energy parameter in the scheduling process.

Conclusion
In this paper, we presented an area coverage scheme called CoWSN to intelligently monitor gas and oil pipelines. The purpose of CoWSN is to reduce energy consumption, improve network lifetime, and achieve the highest coverage rate in the network. To achieve these goals, we used a digital matrix-based technique to calculate the overlap between each sensor node and its neighboring nodes. Then, we designed a Q-Learning-based scheduling mechanism to determine the activity time of each sensor node. Also, CoWSN uses a suitable strategy to timely detect the death of nodes and prevent holes in the network. To evaluate CoWSN, it is simulated with NS2 and  Simulation results show the successful performance of CoWSN. In this paper, we have used the WSN platform (and not a real-time gas or oil pipeline network) to simulate our scheme. In the future research direction, we try to evaluate our method in real gas or oil pipeline environments and under more scenarios so that the performance of our method is further identified. Also, we seek to improve the efficiency of our method using other machine learning (ML) techniques and evolutionary algorithms (EAs) in the future.